home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
- ░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ PROFESSIONAL OPTICAL CHARACTER RECOGNITION ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ PRO-CR(tm) ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
- ░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- Copyright 1989, 1990, 1991, David P. Gray, Gray Design Associates
-
- All Rights Reserved
-
-
- Member, Association of Shareware Professionals
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ C O N T E N T S ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
-
- 1. Specification
- 2. System Requirements
- 3. Files Distributed
- 4. Revision History
-
- 5. USER GUIDE
- 5.1 Start-Up Procedure
- 5.2 Using the Menus
- 5.3 Top Level Menu
- 5.4 Options Menu
- 5.5 Getting a Good Scan
- 5.6 The Sample Image
- 5.7 Error Messages
-
- 6. Program Self-Check
- 7. Comments to the Author
- 8. Association of Shareware Professionals
- 9. Miscellaneous
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 1. SPECIFICATION ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
-
- * Reads 8 to 30 point mono and proportional fonts
- * No font selection required
- * Up to 260 words per minute
- * Supports HP ScanJet directly
- * Supports other scanners producing TIFF or PCX files
- * Selectable resolution, including 200 (fax) and 300 dpi
- * Preview and online-correction modes with graphics adapter
- * Real-time viewing of text during processing
- * Continuous scanning if auto document feeder attached
- * Mis-recognitions flagged with selectable character
- * Menu driven or non-interactive mode from DOS command line
- * Callable from within other programs (Requires License)
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 2. SYSTEM REQUIREMENTS ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
- PRO-CR(tm) performs Optical Character Recognition on an IBM PC or compatible.
- The program will run on an XT or AT, however an AT is strongly recommended due
- to the highly cpu-intensive nature of the program.
-
- A graphics adapter is not required for basic character recognition, but is
- needed for the preview and online-correction functions. If a graphics adapter
- is used, it should be CGA, HERCULES, EGA or VGA. (Note for Hercules users:
- a Microsoft Hercules driver is included. Run MSHERC.COM once before running
- PRO-CR(tm)).
-
- The minimum memory requirement is about 100Kb (512Kb is recommended), although
- the program adapts itself to use as much conventional memory as available.
- A temporary disk-file is used for virtual memory for images too large to fit
- into memory in one go.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 3. FILES DISTRIBUTED ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
-
- OCR.EXE : The PRO-CR(tm) program
- README.DOC : Important information
- HELP1.DOC : Text file used for online help
- HELP2.DOC : Text file used for online help
- MANUAL.DOC : This file
- SAMPLE.TIF : Example TIFF file for processing
- MSHERC.COM : Hercules driver
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 4. REVISION HISTORY ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
-
- 1.0 May 16 1989 : Baseline version.
- 1.01 May 18 1989 : Fixed character editing in font edit
- function, caused by bug in compiler's
- loop optimizer.
- 1.02 May 31 1989 : Don't reject TIFFs with no bits_per_
- sample tag. Assume a value of 1.
- 1.03 Jun 19 1989 : Don't reject TIFFs with no resolution
- tags.
- 1.04 Aug 28 1989 : Fixed bug in learn-mode.
- 1.05 Nov 29 1989 : Fixed bug in Auto sheet feeder control.
-
- 2.00 Apr 30 1990 : Removed font-dependence. Rewrote
- recognition algorithms. Removed learn
- and edit functions. Added preview and
- online correction functions. Added
- support for 200 dpi, compressed TIFFs
- and PCX files. Added "unknown" char
- feature. Installed program self-check.
- Added DOS command line interface.
- 2.01 Feb 10 1991 : Removed nuisance delay screen.
- Fixed problems with command line mode.
- 2.02 May 14 1991 : Added 1-800 telephone number.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 5. U S E R G U I D E ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.1 START-UP PROCEDURE ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- 1. For the interactive Menu driven interface:
- From the DOS prompt, type: ocr
- (but see also the options below)
-
- 2. For the non-interactive DOS command line interface:
- From the DOS prompt, type any of the following options:
- ocr -exec -in=IN -out=OUT -unk=X -dpi=NNN -mode=M
-
- Note that all the above options (apart from -exec) are available when
- starting up in method 1. above. The -exec option is the one that
- whether PRO-CR(tm) starts up in the menu or non-interactive mode.
- None of the options are mandatory.
-
-
- IN.................is the name of the image source to be processed. This may
- be a file name, e.g. image.tif or fred.pcx which is the file name produced by
- your scanner software. Note that both compressed and uncompressed TIFF files
- are supported. The TIFF compression methods supported are types 2 and 3 (CCITT
- group 3 FAX) and 32773 (Macintosh PACKBITS). Alternatively, enter the word
- SCANNER to process directly from your scanner. For HP ScanJet owners with the
- automatic document feeder, enter AUTO to process directly using the feeder.
- The scan will continue until the auto sheet feeder runs out of material to scan.
- (Note: The keywords SCANNER and AUTO must be entered in upper case).
-
- OUT................is the name of the file to write the processed ascii text to.
- If this file does not exist, it will be created. If this file does exist, text
- will be appended to it so that you may scan several times to the same file.
-
- X..................is the character to use if a mis-recognition occurs. A
- suitable character would be a "*" or a "~". Not all mis-recognitions can be
- flagged in this way, there will always be a certain number where PRO-CR(tm)
- believes it has correctly identified the character.
-
- NNN................is the resolution to use when scanning directly from your
- scanner or when processing from a PCX image file. (The PCX format does not
- provide useable resolution information). The resolution is entered in units
- of dots per inch. The range is 100 to 400, but see section 6.4 for help on
- choosing the correct dpi.
-
- M..................is the graphics mode number to specify. PRO-CR(tm) inspects
- your graphics adapter and selects a suitable graphics mode to operate the
- preview and online correction modes. In general it chooses the mode with the
- highest number of colors that has at least 640 pixels horizontally. You may
- force it to choose another mode by using the -mode option. The IBM graphics
- modes are:
-
- _TEXTBW40 0 /* 40-column text, 16 grey */
- _TEXTC40 1 /* 40-column text, 16/8 color */
- _TEXTBW80 2 /* 80-column text, 16 grey */
- _TEXTC80 3 /* 80-column text, 16/8 color */
- _MRES4COLOR 4 /* 320 x 200, 4 color */
- _MRESNOCOLOR 5 /* 320 x 200, 4 grey */
- _HRESBW 6 /* 640 x 200, BW */
- _TEXTMONO 7 /* 80-column text, BW */
- _HERCMONO 8 /* 720 x 348, BW for HGC */
- _MRES16COLOR 13 /* 320 x 200, 16 color */
- _HRES16COLOR 14 /* 640 x 200, 16 color */
- _ERESNOCOLOR 15 /* 640 x 350, BW */
- _ERESCOLOR 16 /* 640 x 350, 4 or 16 color */
- _VRES2COLOR 17 /* 640 x 480, BW */
- _VRES16COLOR 18 /* 640 x 480, 16 color */
- _MRES256COLOR 19 /* 320 x 200, 256 color */
- _ORESCOLOR 64 /* 640 x 400, 1 of 16 colors (Olivetti) */
-
- Note that only modes with 80 column text or at least 640 pixels horizont-
- ally produce a readable display.
-
- Hercules users (mode 8) may need to run the supplied driver MSHERC.COM
- once before running PRO-CR(tm).
-
- Notice: PRO-CR(tm) may NOT be incorporated into any other program (i.e.
- called from within any other program or batch file) and redistributed
- without the express written permission from the author.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.2 USING THE MENUS ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- The following paragraphs show how to navigate PRO-CR's menus and read the
- status line at the bottom of the display.
-
-
- THE MENU BARS......Use the arrow keys to select the required option from the
- menu bar and hit return when the option is highlighted. Alternatively, hit the
- highlighted key in the required option to directly select that option.
-
-
- THE STATUS LINE....The line of text on the last line of the screen shows you
- what parameters you have selected in the options menu and also gives an
- indication of the percentage completion of the current scan. It looks like
- this:
-
- 0% <ESC=abort> input_file->output_file dpi=200 <PC*> SCANJET
-
- <ESC=abort> indicates that you may hit the Escape key during processing
- to abort the recognition phase.
-
- The next two fields "input_file->output_file" indicate both the source of
- the image to be processed (either from an image file or "SCANNER" if from
- a direct scan) and the output text file to receive the processed ascii
- text.
-
- Next is the resolution in dots per inch. This must be set if scanning
- directly or reading from a PCX file.
-
- Next is a block of 3 parameters inside angle brackets <PC*>. The "P", if
- present, indicates that the image "preview" function is active. The "C",
- if present, indicates that the online correction function is active. The
- final character is the "unknown character", i.e. the character to output
- if a mis-recognition occurs. See the help text for the options menu for
- details of these functions.
-
- Finally, the last field indicates the selected scanner type.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.3 TOP LEVEL MENU ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- Top level menu options:
-
- Select "Start" to start the recognition process.
-
- Select "Auto" to start recognition using the auto sheet feeder.
- (Note: this option only appears if you are scanning directly and you have the
- HP ScanJet auto sheet feeder attached)
-
- Select "Options" to setup parameters for the recognition process.
- Options that you need to set up are:
- - the source of the image to be processed, either a file or the scanner
- - the resolution of the image in dots per inch (dpi)
-
- Note that the dpi may be automatically available if you are processing a TIFF
- file since the majority of TIFF software includes this. (A warning is given
- for TIFF files that do not provide this). Otherwise, if you are scanning
- directly or reading a PCX file, then you need to set the dpi correctly.
- PRO-CR(tm) has been optimized for the range 200 to 300 dots per inch.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.4 OPTION MENU ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- Select from the following options:
-
- INPUT..............Enter the source of the image to be processed. This may
- be either directly from your scanner (if it is one of the supported types)
- or an intermediate image file produced by your scanner software. To select
- direct scanning, enter the word "SCANNER" here. To select a file, enter the
- name of the image file produced by your scanner software. Currently
- supported types are TIFF (both compressed and uncompressed) and PCX
- (PC Paintbrush) files. Be sure to include the extension in the file name
- (i.e. .TIF or .PCX) so that PRO-CR(tm) can determine the file type. For the
- case of PCX files, it is necessary for you to set the resolution that the
- image was scanned at (since the majority of PCX producing software does not
- include this in the file). Use the "dpi" option to set this.
-
-
- OUTPUT.............Enter the name of the file you wish PRO-CR(tm) to write the
- ascii text into. Note that PRO-CR(tm) will create the specified file if it does
- not already exist, and will append text to the specified file if it does exist.
-
-
- DPI................Enter the resolution in dots per inch. There are 2 cases
- when this is important. (1) When you are scanning directly. This tells the
- scanner what resolution to scan at. (2) When you are processing an image file
- and the file does not contain the image resolution. This is always true of
- PCX files and is sometimes true of naughty TIFF files. (It is rare for TIFF
- software to omit the resolution and you may assume it is present unless
- PRO-CR(tm) warns you otherwise. Never assume a PCX file contains the image
- resolution though !).
-
- Knowing how to set the right dpi for DIRECT SCANNING:
- PRO-CR(tm) has been written to run best at 200 to 300 dpi. 200 dpi
- covers the fax standard and most hand scanners. (Some popular hand
- scanners claim 300 and 400 dpi modes, however many simply duplicate
- pixels to match printer resolutions and therefore add no more information
- to the scan for OCR purposes). For reasonably sized text 200 dpi should
- be adequate, but if the text is tightly spaced (runs together) then 300
- dpi should help. PRO-CR(tm) allows a range of 100 to 400 dpi for
- experimentation purposes. In general, try 200 dpi first and then 300.
-
-
- UNKNOWN CHAR.......Enter the character you wish to be embedded in the output
- text to flag characters that could not be recognized. This is useful if you
- use an editor after the recognition phase to correct mis-recognitions. It
- allows you to search for this character to speed up the correction phase.
- Use of a spelling checker should also prove helpful.
-
-
- PREVIEW............This function is active if a check mark appears next to it
- in the menu. The function is toggled on and off by repeatedly selecting it.
- When active, as much of the image as can fit on the screen is displayed prior
- to being processed. This is useful to examine the quality of the scan, for
- skewed paper, lack of contrast, suitable resolution etc. This function is
- only available for users with a suitable graphics adapter.
-
-
- CORRECT...........This function is active if a check mark appears next to it
- in the menu. The function is toggled on and off by repeatedly selecting it.
- When active, you will be allowed to correct mis-recognitions online during the
- OCR processing phase. You will be prompted for the correct character(s) when
- a mis-recognition occurs. At the lower right hand side of the screen an image
- of the mis-recognized character(s) will appear for you to correct. If you are
- unable to recognize the text, hit return and the text will be replaced by the
- "unknown" character mentioned above. You may also hit Escape to abort the
- processing. This function is only available to users with a suitable graphics
- adapter.
-
- SCANNER TYPE......Select your scanner type from the list. At the time of
- writing, only the HP ScanJet is directly supported, however I have had a large
- number of requests for the Logitech SCANMAN and plan to support this.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.5 GETTING A GOOD SCAN ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
-
- To get the best scan, several factors must be taken into account.
-
- 1. Make sure the text is scanned in a straight line. This is especially
- crucial for users of hand scanners. The lines of text should appear as
- horizontal and level as possible when previewed. If lines are skewed, or
- rotated, not only is the recognition accuracy impaired but it may cause
- PRO-CR(tm) to mistakenly run lines together. This is not so much a problem
- for flat bed scanners since they have guides for inserting the paper. Even
- so, some printers may print sloping lines which are not perpendicular to
- the sides of the paper. Hand scanners have the additional problem of
- matching up "strips" when scanning in several passes. I would suggest that
- 1 strip at a time be processed in this case.
-
- 2. Make sure the scanner's contrast is set to a suitable value. This is not a
- problem for the HP ScanJet which has an auto-contrast feature. With other
- scanners, preview the image and look for broken or faded characters or
- extraneous "noise" as a result of poor contrast. Best results are obtained
- when the background is clear and the characters appear sharp.
-
- 3. Select a suitable dpi. Use preview to inspect the image. If possible,
- select a dpi which causes the characters to appear separate. If too many
- characters run together, the recognition accuracy will be impaired. If you
- cannot separate the characters using a higher dpi (inspection of the
- original copy will show that the characters are indeed joined together)
- then select the online-correction mode and you will be prompted to enter
- the unrecognized portion of each scanned line during the recognition phase.
- If you do not select the online correction mode then PRO-CR(tm) will
- attempt to separate characters that are joined together up to a maximum of 3.
- For mono-spaced fonts this strategy works pretty well, but for proportional
- fonts results are less desirable. Note that for some near letter quality
- dot matrix fonts, lowering the dpi will actually improve results. The
- reason is that the dots making up the characters will appear to join up and
- become united. PRO-CR(tm) cannot read draft dot matrix fonts since the
- dots making up the characters are disjointed. For the same reason,
- PRO-CR(tm) will not read characters that appear broken. Broken characters
- lose their essential features which PRO-CR(tm) relies upon for recognition.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.6 THE SAMPLE IMAGE ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
- A sample image is supplied with which you can experiment with PRO-CR's features.
- The image is in the file SAMPLE.TIF and is in compressed TIFF format. Make sure
- the status line shows sample.tif as the image source. Select the online
- correction feature and process the image. Notice that the "mm" in the word
- "common" and "qu" from the word "quality" are presented for correction. This is
- because these characters are joined together and cause a mis-recognition. After
- the scan, deselect the online correction feature and reprocess the image.
- Notice that PRO-CR(tm) successfully separates the "mm" by itself but fails with
- the "qu", only recognizing the "q" correctly.
-
-
-
-
-
- ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 5.7 ERROR MESSAGES ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
-
- The following error codes may be seen, to do with TIFF files.
-
- 1 ...... Could not find the input file.
- 2 ...... Non-Intel byte order. The TIFF file is possibly a Mac file.
- 3 ...... Wrong value for bits_per_sample tag.
- 4 ...... Unsupported Compressed TIFF file.
- 5 ...... Wrong value for photometric_interpretation tag.
- 6 ...... Wrong value for fill_order tag.
- 7 ...... Wrong picture orientation.
- 8 ...... Wrong value for samples_per_pixel tag.
- 9 ...... Wrong value for minimum_sample tag.
- 10 ...... Wrong value for maximum_sample tag.
- 11 ...... Wrong value for planar_configuration tag.
- 12 ...... Missing bits_per_sample tag.
- 13 ...... Missing image_width tag.
- 14 ...... Missing image_length tag.
- 15 ...... Missing image_pointer tag.
- 16 ...... Missing X_resolution tag.
- 17 ...... Missing Y_resolution tag.
-
- In addition, there are several self explanatory warnings and other
- error messages.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 6. PROGRAM SELF-CHECK ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
- PRO-CR(tm) provides a measure of protection for itself against accidental
- corruption during downloads from bulletin boards. A self-check is performed
- when the program starts up and will notify the user upon failure. Note that
- the program must be run from the directory where it resides or the self-check
- will fail.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 7. COMMENTS TO THE AUTHOR ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
- Any feedback would be greatly appreciated. Please direct any comments to
- the author personally via mail to David P. Gray, Gray Design Associates,
- P.O. Box 333, Northboro, MA 01532, USA.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 8. ASSOCIATION OF SHAREWARE PROFESSIONALS ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
- This software is produced by David P. Gray who is a member of the Association
- of Shareware Professionals (ASP). ASP wants to make sure that the shareware
- principle works for you. If you are unable to resolve a shareware-related
- problem with an ASP member by contacting the member directly, ASP may be able
- to help.
-
- The ASP Ombudsman can help you resolve a dispute or problem with an ASP member,
- but does not provide technical support for members' products. Please write to
- the ASP Ombudsman at 545 Grover Road, Muskegon, MI 49442, USA or send a
- CompuServe message via Easyplex to ASP Ombudsman 70007,3536.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 9. MISCELLANEOUS ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-
- HP and ScanJet are registered trade marks of Hewlett Packard.
- ScanMan is a registered trade mark of Logitech inc.
-
-
-
-
-
- ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ END OF MANUAL.DOC ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
-